i-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification
نویسندگان
چکیده
This paper proposes applying multi-task learning to train deep neural networks (DNNs) for calibrating the PLDA scores of speaker verification systems under noisy environments. To facilitate the DNNs to learn the main task (calibration), several auxiliary tasks were introduced, including the prediction of SNR and duration from i-vectors and classifying whether an i-vector pair belongs to the same speaker or not. The possibility of replacing the PLDA model by a DNN during the scoring stage is also explored. Evaluations on noise contaminated speech suggest that the auxiliary tasks are important for the DNNs to learn the main calibration task and that the uncalibrated PLDA scores are an essential input to the DNNs. Without this input, the DNNs can only predict the score shifts accurately, suggesting that the PLDA model is indispensable.
منابع مشابه
Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification
We propose an expanded end-to-end DNN architecture for speaker verification based on b-vectors as well as d-vectors. We embedded the components of a speaker verification system such as modeling frame-level features, extracting utterance-level features, dimensionality reduction of utterancelevel features, and trial-level scoring in an expanded end-toend DNN architecture. The main contribution of...
متن کاملCosine distance features for robust speaker verification
We use similarities with people we know already as a means to enhance the speaker verification accuracy. Motivated by this, we use cosine distance similarities with a set of reference speakers, cosine distance features (CDF), to improve the performance of speaker verification systems for clean and additive noise test conditions. We used mel frequency cepstral coefficients, power normalized ceps...
متن کاملPhone-centric local variability vector for text-constrained speaker verification
This paper investigates the use of frame alignment given by a deep neural network (DNN) for text-constrained speaker verification task, where the lexical contents of the test utterances are limited to a finite set of vocabulary. The DNN makes use of information carried by the target and its contextual frames to assign it probabilistically to one of the phonetic states. The frame alignment is th...
متن کاملImproving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
Recently, deep neural networks (DNNs) trained to predict senones have been incorporated into the conventional i-vector based speaker verification systems to provide soft frame alignments and show promising results. However, the data mismatch problem may degrade the performance since the DNN requires transcribed data (out-domain data) while the data sets (indomain data) used for i-vector trainin...
متن کاملLinear Regression for Speaker Verification
This paper presents a linear regression based backend for speaker verification. Linear regression is a simple linear model that minimizes the mean squared estimation error between the target and its estimate with a closed form solution, where the target is defined as the ground-truth indicator vectors of utterances. We use the linear regression model to learn speaker models from a front-end, an...
متن کامل